Skip to content

Add graph validation and statistics logging for debugging graph construction#519

Open
princekumarlahon wants to merge 4 commits intomllam:mainfrom
princekumarlahon:graph-validation
Open

Add graph validation and statistics logging for debugging graph construction#519
princekumarlahon wants to merge 4 commits intomllam:mainfrom
princekumarlahon:graph-validation

Conversation

@princekumarlahon
Copy link
Copy Markdown

Describe your changes

This PR introduces lightweight graph validation and diagnostic utilities to improve debugging during graph construction.

It adds two helper functions:

  • validate_graph
    Performs sanity checks on graph structure (shape, empty edges, invalid indices) and fails early with clear error messages.

  • compute_graph_stats
    Logs useful statistics about the graph, including number of nodes, edges, degree distribution, and isolated nodes.

These utilities are integrated into the graph creation pipeline for:

  • grid-to-mesh (g2m)
  • mesh-to-grid (m2g)
  • mesh-to-mesh (m2m, per level)

Motivation and context

While working with graph construction, it can be difficult to quickly verify whether a generated graph is valid or understand its structure.

This change helps by:

  • catching invalid graphs early
  • providing quick visibility into connectivity patterns
  • making debugging easier when experimenting with graph configurations

Dependencies

No new dependencies are introduced.


Issue Link

closes #518


Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch
  • I have performed a self-review of my code
  • I have added docstrings for new/modified functions
  • I have added inline comments where necessary
  • I have updated the README (not required for this change)
  • I have added tests (validated locally with synthetic graphs)
  • I have given the PR a clear and descriptive name
  • Reviewer/assignee will be added by maintainers if needed

Checklist for reviewers

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • Add CHANGELOG entry (will update after review if accepted)

Checklist for assignee

  • PR is up to date with the base branch
  • tests pass
  • PR is assigned to the next milestone
  • changelog entry is added

@princekumarlahon
Copy link
Copy Markdown
Author

Happy to iterate on this if any changes are needed!

Copy link
Copy Markdown
Contributor

@kshirajahere kshirajahere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for your work on this. I found a couple of things that are worth clearing up, and I’ve left inline comments with the details.
The PR also includes an unrelated CustomMLFlowLogger type-hint cleanup in custom_loggers.py.


import torch

degrees = torch.bincount(edge_index[1], minlength=num_nodes)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These stats are using only in-degree (edge_index[1]), but the log messages say "degree" and "isolated nodes" as if they describe the whole graph. For directed graphs like g2m, source-only nodes get counted as isolated even when they have outgoing edges, so the debug output is misleading.

if edge_index.min() < 0:
raise ValueError(f"[{name}] found negative node indices")

if edge_index.max() >= num_nodes:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is incompatible with the hierarchical m2m graphs below, because from_networkx_with_start_index() intentionally keeps globally offset node ids. For level 1+ edge_index.max() can be much larger than num_nodes even though the graph is valid, so this now breaks hierarchical graph generation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback I've updated the stats to use total degree (in + out), and added separate logging for in-degree and out-degree. Also added a small guard for empty graphs to avoid edge-case issues. Let me know if this looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add graph statistics and validation utilities for easier debugging

2 participants